Model Selection

Long Sequence Processing

# Long Sequence Processing

Modernpubmedbert

A sentence transformer model trained on the PubMed dataset, supporting multiple embedding dimensions, suitable for biomedical text processing.

Ruri v3 is a Japanese general-purpose text embedding model based on ModernBERT-Ja, supporting sequence processing of up to 8192 tokens and delivering top-tier performance in Japanese text embedding tasks.

Text Embedding Japanese

Sapnous-6B is an advanced vision-language model that enhances perception and understanding of the world through powerful multimodal capabilities.

Transformers English

FANformer-1B is an autoregressive model enhanced with innovative periodic mechanisms for language modeling, featuring 1.1 billion non-embedding parameters and trained on 1 trillion tokens.

Large Language Model

Transformers English

Codemodernbert Owl

CodeModernBERT-Owl is a model pre-trained from scratch, specifically designed for code retrieval and code understanding tasks, supporting multiple programming languages and improving retrieval accuracy.

Text Embedding Supports Multiple Languages

Zamba 7B V1 Phase1

Zamba-7B-v1-phase1 is a hybrid architecture combining state space model Mamba with Transformer, using Mamba as the backbone network and sharing one Transformer layer every six modules, trained via next-token prediction.

Large Language Model

Bert Large Cantonese

A large-scale BERT model trained from scratch on Cantonese text, suitable for masked language modeling tasks in Cantonese

Large Language Model

Transformers Other

Mistral-SUPRA is a linear RNN model initialized based on Mistral-7B, combining the functions of Transformer and recurrent models.

Large Language Model

PyTorch English

Saul Instruct V1 GGUF

Saul-Instruct-v1-GGUF is the GGUF format version of the Equall/Saul-Instruct-v1 model, suitable for text generation tasks and supports multiple quantization levels.

Large Language Model English

Mamba is an efficient sequence model compatible with transformers, with 790 million parameters, suitable for causal language modeling tasks.

Large Language Model

Mamba is a transformer-compatible sequence modeling model with efficient inference capabilities.

Large Language Model

Mamba is an efficient language model based on the State Space Model (SSM) architecture, with 1.4B parameters, supporting text generation tasks

Large Language Model

Rank Zephyr 7b V1 Full GGUF

A text ranking model based on Mistral-7B, offering multiple quantized versions for efficient inference.

Large Language Model English

Mixtral 8x7B V0.1 GGUF

GGUF quantized version of Mixtral-8x7B-v0.1, supporting multiple bit quantization, suitable for text generation tasks.

Large Language Model Supports Multiple Languages

Sauerkrautlm 7b HerO Mistral 7B Instruct V0.1 GGUF

This is a German/English bilingual model fine-tuned based on Mistral-7B-Instruct-v0.1, quantized in GGUF format with support for multiple quantization levels from 2 to 8 bits.

Large Language Model Supports Multiple Languages

Mamba-1B is a 1B-parameter language model based on the Mamba architecture, supporting English text generation tasks.

Large Language Model

Transformers English

Dolphin 2.5 Mixtral 8x7b GPTQ

Dolphin 2.5 Mixtral 8X7B is a large language model developed by Eric Hartford based on the Mixtral architecture, fine-tuned on multiple high-quality datasets, suitable for various natural language processing tasks.

Large Language Model

Transformers English

Mixtral 8x7B Instruct V0.1 HF

Mixtral-8x7B is a pre-trained generative sparse mixture of experts large language model that outperforms Llama 2 70B on most benchmarks.

Large Language Model

Transformers Supports Multiple Languages

JAIS-30B is a 30-billion-parameter bilingual (Arabic and English) large language model based on the GPT-3 architecture, utilizing ALiBi positional embedding technology, achieving state-of-the-art performance in Arabic tasks.

Large Language Model

Transformers Supports Multiple Languages

Llava V1.5 13B GPTQ

Llava v1.5 13B is a multimodal model developed by Haotian Liu, combining visual and language capabilities to understand and generate content based on images and text.

13 billion parameter Arabic-English bilingual large language model based on Transformer architecture, supporting long sequence processing

Large Language Model

Transformers Supports Multiple Languages

Codellama 34B Instruct GPTQ

CodeLlama 34B Instruct is a 34-billion-parameter code generation model released by Meta, based on the Llama 2 architecture, specifically fine-tuned for programming tasks.

Large Language Model

Transformers Other

Nystromformer 4096

Long-sequence Nyströmformer model trained on WikiText-103 v1 dataset, supports sequence processing up to 4096 tokens

Large Language Model

Nystromformer 2048

Nystromformer model trained on the WikiText-103 dataset, supporting long sequence processing (2048 tokens)

Large Language Model

20220415 210530

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-2b on the common_voice dataset

Speech Recognition

A pre-trained unbalanced Transformer model for Chinese understanding and generation, supporting various natural language processing tasks

Large Language Model

Transformers Chinese

Nystromformer 512

An efficient Transformer model optimized with the Nyström method for handling long-sequence tasks

Large Language Model

Language Perceiver

Pre-trained on BERT-style masked language modeling tasks, supports multimodal Transformer model processing UTF-8 byte inputs

Large Language Model

Transformers English

Bigbird Roberta Large

BigBird is a Transformer model based on sparse attention, capable of processing sequences up to 4096 tokens long, suitable for long document tasks.

Large Language Model English

Asymmetric pre-trained Transformer model for Chinese comprehension and generation tasks

Large Language Model

Transformers Chinese

Biobert Large Cased V1.1 Squad

BioBERT is a BERT-based pretrained language model specifically optimized for biomedical text mining tasks, suitable for question answering systems.

Question Answering System

YOSO is an efficient Transformer variant that reduces the self-attention complexity from quadratic to linear through Bernoulli sampling attention mechanism, supporting sequence lengths up to 4096.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase